Outlier Detection Based on the Distribution of Distances between Data Points
نویسنده
چکیده
A novel approach to outlier detection on the ground of the properties of distribution of distances between multidimensional points is presented. The basic idea is to evaluate the outlier factor for each data point. The factor is used to rank the dataset objects regarding their degree of being an outlier. Selecting the points with the minimal factor values can then identify outliers. The main advantages of the approach are: (1) no parameter choice in outlier detection is necessary; (2) detection is not dependent on clustering algorithms. To demonstrate the quality of the outlier detection, the experiments were performed on widely used datasets. A comparison with some popular detection methods shows the superiority of our approach.
منابع مشابه
Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator
The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...
متن کاملA statistical test for outlier identification in data envelopment analysis
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...
متن کاملRNN (Reverse Nearest Neighbour) in Unproven Reserve Based Outlier Discovery
Outlier detection refers to task of identifying patterns. They don’t conform establish regular behavior. Outlier detection in highdimensional data presents various challenges resulting from the “curse of dimensionality”. The current view is that distance concentration that is tendency of distances in high-dimensional data to become in discernible making distance-based methods label all points a...
متن کاملExample-Based DB-Outlier Detection from High Dimensional Datasets
Outlier detection is an important problem that has applications in many fields. High dimensional datasets are common in such applications. Among the existing outlier detection methods, Distance-Based outlier (DB-Outlier) detection is one of the most generalizable and simplest approaches. It finds outliers by calculating distances between data points. However, in high dimensional space, data dis...
متن کاملOutlier Detection by Boosting Regression Trees
A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Informatica, Lith. Acad. Sci.
دوره 15 شماره
صفحات -
تاریخ انتشار 2004